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ABSTRACT 


The World Health Organization has declared the 
present Zika virus epidemic to be a ‘Public Health 
Emergency of International Concern’. The virus 
appears to have spread from Thailand to French 
Polynesia in 2013, and has since infected over a 
million people in the countries of South and Central 
America. In most cases the infection is mild and 
transient, but the virus does appear to be strongly 
neurotropic and the presumptive cause of both birth 
defects in fetuses and Guillain-Barré syndrome in 
some adults. In this paper, the techniques and 
utilities developed in the study of mitochondrial DNA 
were applied to the Zika virus. As a result, it is 
possible to show in a simple manner how a 
phylogenetic tree may be constructed and how the 
mutation rate of the virus can be measured. The study 
showed the mutation rate to vary between 12 and 25 
bases a year, in a viral genome of 10 272 bases. This 
rapid mutation rate will enable the geographic 
spread of the epidemic to be monitored easily and 
may also prove useful in assisting the identification 
of preventative measures that are working, and 
those that are not. 
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INTRODUCTION 


On 01 February, 2016, the World Health Organization declared 
the emerging Zika epidemic to be a ‘Public Health Emergency of 
International Concern’ (PHEIC), highlighting that this epidemic is 
now considered to be a major threat to the whole world (WHO, 
2016). Their statement of intent includes the lines: Appropriate 
research and development efforts should be intensified for Zika 
virus vaccines, therapeutics and diagnostics; and, National au- 
thorities should ensure the rapid and timely reporting and sharing 
of information of public health importance relevant to this PHEIC. 

As a consequence, it can be expected that many re- 
search institutions will increase their studies into Zika and 
related viruses and many new scientific papers will appear 
in the coming months. At the same time, it is expected that 
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many more RNA sequences of the virus will appear in the 
public domain. 

Also, as a result of the rapidly increasing importance of the 
Zika virus, it is likely that scientists and physicians who normally 
would not study the genetics of a virus might start examining 
the newly available data. 

The Zika virus is a Flavivirus carried by mosquitoes and was 
originally found in a Rhesus monkey placed in the Zika forest of 
Uganda in 1947, as described by Haddow et al. (1964). Over 
the last 60 years it has been the cause of epidemics in several 
African countries. However, in 2010 the virus spread to parts of 
Asia, in particular to Thailand (Fonseca et al., 2014; Haddow et 
al., 2012), and by 2013 had reached French Polynesia (Baronti 
et al., 2014). Since then there has been an explosive epidemic 
affecting the populations of many countries in both South and 
Central America. At the present time this epidemic shows no 
signs of abating. 

In the many small epidemics that have occurred, there has 
been no indication of the virus causing anything but mild and 
transient infections. In the recent epidemic in Polynesia, how- 
ever, cases of central nervous system damage have been 
observed and described as being a form of Guillain-Barré dis- 
ease (Korff, 2013; Winer, 2014). In the current Brazilian epidemic, 
the emphasis has been on the possibility of an association with 
birth defects, especially microcephaly, resulting from maternal 
infection with Zika in the first and second trimester of pregnancy 
(Mlakar et al., 2016). The presumptive link between Zika infection 
and microcephaly is now looking more and more likely. Further 
cases of Guillain-Barré syndrome have also been seen. 

The Zika virus is closely related to the viruses of Yellow, Den- 
gue and West Nile fever, all of which cause significant illness 
and mortality. However, there are many other flaviviruses that 
are less well known and their hosts include horses, sheep, bats, 
birds and many other animals. A paper in 1998 listed over 70 
different flaviviruses (Kuno et al., 1998), with new ones continu- 
ing to be identified (Moureau et al., 2015). 

All flaviviruses appear to have much the same structure. The 
mature virus particles, virions, are about 50 nm in diameter and 
icosahedral in shape. Modern electron microscopy can show 
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virions in considerable detail (Zhang et al., 2013; Zhou, 2014). 
The outer part is formed by an envelope overlying a phosphol- 
ipid bi-layer membrane and the core contains a single stranded 
RNA molecule of about 10k bases. 

In the mature Zika virion, the RNA molecule, which en- 
codes a polyprotein, is described as having 10 272 bases, or 
3 424 3-base codons for specific amino acids. The transla- 
tion of bases to functional codons is not perfect, but for 
analysis purposes it has become accepted to describe the 
structure of the molecule in this manner: 

starting with MKN ... and ending with ... GVL 

(i.e. The codons for: methionine, lysine, asparagine ....... 

bende glycine, valine, leucine). 

The polyprotein is a linear assembly of both structural and 
non-structural genes. The structural genes are for the envelope, 
membrane and capsid, and the non-structural genes are usu- 
ally NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5 (Bollati et 
al., 2010). For the purposes of this paper, this simple explana- 
tion will suffice. 

The envelope and membrane genes define how the outer 
part of a virion is conformed. This outer part is important as it 
acts as an antigen for antigen-antibody reactions and also in 
the interaction between the virus and entry receptors when a 
virion attempts to enter a cell. Consequently, mutations affecting 
the construction of the envelope and membrane are probably 
more significant than mutations in other parts of the genome, 
and perhaps of greater influence when it comes to possible 
changes in virulence. 

It remains unclear as to which cells, if not all cells, in humans 
are susceptible to invasion with the Zika virus. However, the cells 
of the central nervous system do appear to be especially vulner- 
able (Mlakar et al., 2016). In all instances, the process appears to 
be the same, whereby a virion attaches itself to the entry recep- 
tors on the outer surface of the cell and then enters the cell in a 
process described as endocytosis (Perera-Lecoin et al., 2014). 

Once inside a human cell, the envelope and membrane 
separate from the core. The virus then hi-jacks the cellular 
apparatus for its own purposes. The polyprotein is copied and 
cleaved into its constituent parts and daughter virions are pro- 
duced, each containing its own copy of the polyprotein (Clark & 
Harris, 2006). 

At present, there are few drugs that prevent replication of the 
Zika virus, and the older and well-established antiviral drugs, 
such as amantadine, which are active against the influenza 
virus, are not helpful against flaviviruses (Oxford et al., 1970). 
However, a great deal of work is being done to find new antiviral 
drugs (Sampath & Padmanabhan, 2009). It is interesting to 
note that a traditional Chinese remedy, Xiyanping, was used for 
its anti-viral properties in the treatment of a recent case of Zika 
(Deng et al., 2016). 

In relation to the present epidemic, the ability of the Zika virus 
to enter cells of the placenta and the central nervous system is 
particularly important. For now, however, it is unclear whether or 
not these invasions are more dependent on the strain of the 
virus, the genetic make-up of the host, or some other factors. 
Once the virus has crossed the placental barrier to enter the 


fetus or the blood-brain barrier to enter the central nervous 
system, it is likely that the usual antigen-antibody reactions are 
lessened and the virus is able to proliferate more easily. It is 
also unknown as to how long it might take for the virus to be 
cleared from the fetus or central nervous system, although it 
does seem likely that virus replication can continue in these 
areas for many months. 


MATERAILS AND METHODS 


GenBank database 
The RNA sequences for the Zika virus in the public domain can 
be found in the GenBank database of the National Institute of 
Health (Benson et al., 2013). At present (March 2016), there are 
16 complete sequences from viruses collected in Africa and 
Asia before the start of the present epidemic and 17 sequences 
produced since. Details of these sequences are given in Table 1. 
The corresponding page on the GenBank database for a 
given sequence can be found by using a URL of the form: 
http://Awww.ncbi.nim.nih.gov/nuccore/KU744693. Each page 
gives the amino acid list and nucleotide base FASTA file for the 
RNA sequence. However, a GenBank page contains no real 
explanation as to what each list or file might mean and for this 
reason the author has developed a pair of Zika virus utilities that 
allow the user to compare one sequence with another. 


Zika virus utilities 

In conjunction with this paper, two simple utilities were prepared 
and can be found on the author’s website in the form of two 
webpages (www.ianlogan.co.uk/zikapages/zika.htm). From there 
the user can choose either the Amino Acid Analyser or the 
Nucleotide Base Analyser 

The Amino Acid Analyser has in its source file copies of the 
amino acid lists for all complete RNA sequences found in the 
GenBank database and a small JavaScript program that allows 
the user to compare any sequence against any other. The 
results are displayed as a list of amino acid changes. 

The Nucleotide Base Analyser has in its source file copies of 
the FASTA files for the complete RNA sequences of all se- 
quences in the current epidemic. Again, a small JavaScript 
program enables the user to compare two sequences and show 
the mutational differences between them. 

Although the webpages can be viewed with any commonly 
used web browser, the author recommends MOZILLA FIREFOX 
as it allows the user to alter the size of the text area, if needed. 

It is the author’s intention to keep these webpages up-to-date 
as new Zika RNA sequences appear on the GenBank database. 


RESULTS 


Non-synonymous amino acid changes observed in the 
present epidemic 

A mutation that causes a non-synonymous change of an amino 
acid is often considered to be significant. However, if the change is 
between amino acids of similar size and polarity, there is probably 
no effective change in the functioning of the target protein. 
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Table 1 ZIKA RNA sequences in GenBank database (01 March 2016) 





Accession Country Date of 
no. of origin collection 
African sequences 

1 LC002520 Uganda 1947 
2 HQ234499 Malaysia 1966 
3 HQ234500 Nigeria 1968 
4 KF383116 Senegal 1968 
5 KF383115 CAR 1968 
6 HQ234501 Senegal 1984 
7 KF268948 CAR 1976 
8 KF268949 CAR 1980 
9 KF268950 CAR 1980 
10 KF383117 Senegal 1997 
11 KF383118 Senegal 2001 
12 KF383119 Senegal 2001 
Asian sequences 

13 EU545988 Micronesia 2007 
14 JN860885 Cambodia 2010 
15 KU681082 Philippines 2012 
16 KU681081 Thailand 2014 
Current epidemic 

Brazilian reference sequence 

17 KJ776791 Polynesia 2013 
18 KU365779 Brazil 2015 
19 KU365778 Brazil 2015 
20 KU365777 Brazil 2015 
21 KU365780 Brazil 2015 
22 KU312312 Suriname 2015 
23 KU501215 Puerto Rico 2015 
24 KU509998 Haiti 2014 
25 KU321639 Brazil 2015 
26 KU527068 Brazil 2015 
27 KU647676 Martinique 2015 
28 KU501216 Guatemala 2015 
29 KU501217 Guatemala 2015 
30 KU707826 Brazil 2015 
31 KU497555 Brazil 2015 
32 KU740184 China 2016 
33 KU744693 Venezuela 2016 


The amino acid changes shown by the Zika RNA sequences in 
the present epidemic are listed in Table 2. The table demon- 
strates that by using this method the sequences can be split into 
12 different strains with between 0 and 25 amino acid changes. 
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The mutation M2634V is common to all virus strains that 
have come from countries in South and Central America and is 
caused by the base mutation A7900G. However, as this muta- 
tion is found in the NS5 gene, it is unlikely to be of significance 
as to the virulence or general behavior of the Zika virus. The 
NS5 gene is involved in the replication of new virions and is not 
a structural gene (Zhao et al., 2015). It is perhaps too early to 
say that this mutation has absolutely no effect, but for the mo- 
ment the M2634V mutation can be seen as a useful marker to 
the present epidemic. 


Base mutations in samples collected in the present epidemic 
While there are relatively few non-synonymous mutations in the 
virus strains collected in the present epidemic, there are many 
more synonymous mutations (i.e., mutations that do not pro- 
duce a change of amino acid),and as a result a phylogenetic 
tree can be constructed. 

Figure 1 shows the phylogenetic tree produced by using the 
mutations from the 17 complete Zika sequences currently found 
in the GenBank database. This figure shows the virus samples 
can be separated into 15 different strains, considering the se- 
quence pairs KU365777/KU365780 and KU365799/KU707286 
as being from two strains. 


Estimation of the Zika virus mutation rate 

The data presented in Figure 1 show that actual mutations vary 
between 9 and 64 for the sequences collected during the pre- 
sent epidemic. This number was calculated by considering the 
mutations that have occurred since the outbreak of the epi- 
demic in Brazil, and Figure 1 suggests the use of a hypothetical 
Brazilian Reference Sequence (BRS) to describe a possible 
sequence for the original strain arriving in Brazil. 

The number of mutations found in a sequence appears to be 
proportional to the date on which the original sample was col- 
lected. The early sequences show between 9 and 30 mutations, 
whereas the two latest sequences, KU740184 and KU744693, 
show 30 and 64 mutations, respectively. The latter sequence is 
from a sample collected on 6 February, 2016, and shows that 
the Zika virus continues to mutate at a rapid rate. 

As the present epidemic can be considered to have started in 
Polynesia in 2013 (Baronti et al., 2014) and has now lasted 
about 2.5 years (i.e., late 2013 to early 2016), the mutation rate 
appears to vary between about 12 to 25 mutations a year. The 
genome of the Zika virus is normally considered to be a poly- 
protein of 10 272 nucleotide bases, so the mutation rate can 
also be considered as changing between 0.12% and 0.25% of 
the RNA polyprotein each year. 

It is not possible, using the data presently available, to pro- 
vide a more accurate value for the mutation rate. However, the 
suggested rate of 12 to 25 mutations a year would appear to be 
a suitable starting point for further studies. 


DISCUSSION 
Present epidemic 


The decision by the World Health Organization to declare a 
Public Health Emergency in February 2016 due to the threat of 


Table 2 Non-synonymous amino acid changes found in sequences from the present Zika virus epidemic 





GenBank accession no. 
-Country of origin-Date of collection 


Amino acid changes 





KJ776791-Polynesia-2013 


KU365778-Brazil-2015 M2634V 
KU365779-Brazil-2015 M2634V 
KU707826-Brazil-2015 M2634V 
KU365777-Brazil-2015 M2634V N2778D 
KU365780-Brazil-2015 M2634V N2778D 
KU312312-Suriname-2015 M166T T769A 
KU501215-Puerto Rico-2015 I80T A2611V 
KU527068-Brazil-2015 K940E T1027A 
KU647676-Martinique-2015 D107E R1118W 
KU501216-Guatemala-2015 V3461 G894A 
KU501217-Guatemala-2015 V3461 G894A 
KU497555-Brazil-2015 S550T L1259F 
KU740184-China-2016 D107E D445G 
KU509998-Haiti-2014 Y916H H1857Y 
KU321639-Brazil-2015 Y916H H1857Y 
KU744693-Venezuela-2016 E76D V323A 
H613D V620G 
S970W R1005W 
H1857Y S1867R 
D2419E 12445M 
P2833A N2974l 


M2634V 

M2634V 

M1143V T2509 M2634V 

11226T M2634V T3353A 

M2074L M2634V K2694R R3045C 
M2074L M2634V K2694R R3045C 
M2634V E2831V 

11285V M2634V T2749 V2787A 
12295M 12445M M2634V 

12295M 12445M M2634V 

1442L V503A D520A L612V 
A623G F7391 A794G D795G 
T1050A C1107S R1118Q D1856E 
D1938G 12295M A2313P T2317S 
M2634V S2807A H2809K E2831D 
M2975T V3334A 


The change M2634V (Methionine to Valine), common to all sequences, occurs in the NS5 gene and is the result of the base mutation: A7900G, which 


changes the codon from 'ATG' to 'GTG’. 


a Zika virus pandemic may be thought a pessimistic move. How- 
ever, the evidence appears to indicate that the Zika virus is no 
longer restricted to localized habitats nor largely dependent on 
the monkey as its host. It now covers a much larger area in South 
and Central America where it is wholly dependent on the human 
as its host. Unfortunately, there appears to be little, if any, herd 
immunity against the virus in the populations of these countries, 
despite the closely related Dengue virus being prevalent. 

The sudden spread of Zika to South and Central America 
does not appear to have been due to any change in the mos- 
quito vector or anything known to make the Zika virus more 
virulent. Rather, it appears related to the fact that infected peo- 
ple are now able to fly rapidly from country to country, thereby 
spreading the disease very easily. This means there is little to 
stop the epidemic continuing to spread to other populations that 
also have low levels of herd immunity against the virus. 

The absence of mosquitoes and the low incidence of person- 
to-person spread of the virus will probably mean the epidemic will 
not spread in countries of the southern and northern latitudes. 
From evidence obtained so far, however, it would appear likely 
the epidemic is only at its earliest stage and any suggestions as 
to what might happen remain speculative (Bewick et al., 2016). 


Zika phylogenetic tree 

The two utilities prepared for this study show that many dis- 
tinct strains of the Zika virus now exist, even though the present 
epidemic is less than three years old. When considering just the 
non-synonymous mutations in the RNA, it is possible to define 
11 strains in the present epidemic. However, a more detailed 


examination looking at the actual mutations of the available 
sequences distinguishes 15 strains. As more data are made 
available, it is expected that the number of identifiable strains 
will increase. The phylogenetic tree shown in Figure 1 suggests 
the beginning of a geographical spread of the associated virus 
strains, with distinct strains now coming from Martinique, Gua- 
temala, Puerto Rico and Suriname, whilst Brazil continues to 
show a mix of strains. 

Estimation of the Zika mutation rate 

The data used in building the phylogenetic tree can also be 
used to estimate the mutation rate of the Zika virus, which 
appears to vary between about 12 to 25 mutations per year, 
equivalent to 0.12%-0.25% of the RNA mutating each year. This 
rate is very high when compared to the human DNA mutation 
rate, where a period of perhaps 250 000 years might be ex- 
pected (Logan, 2015). Nonetheless, it is not really appropriate 
to compare RNA mutations against DNA chromosomal muta- 
tions as DNA replication is a self-correcting process, whereas 
RNA duplication is liable to many sorts of errors. 

However, the clear evidence of a high mutation rate in the 
Zika virus will allow for the present epidemic to be tracked in a 
fairly simple manner, and should be helpful in identifying where 
local mosquito prevention initiatives are working and where they 
are not. 


Zika infection complications 

A particular feature of the present epidemic is the presump- 
tive link to the high incidence of fetal abnormalities and cases of 
Guillain-Barré syndrome. These two complications appear to be 
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Updated ZIKA Tree 


1 March 2016 














KU365777 KU365779 KU312312 KU365778 KU501215 KU740184 KU501216 KU501217 KU647676 KU497555 KU527068 KU509998 KU321639 KU744693 
KU365780 Brazil Brazil Suriname Brazil Puerto Rico China Guatemala Guatemala Martinique Brazil Brazil Haiti Brazil Venezuela 
Brazil 2015 2015 2015 2015 2015 2016 2015 2015 2015 2015 2015 2014 2015 2016 
2015 I 
T497C G216T T239C T321A ¢7500T T321A T291C T1509C A312G G228T A2910G T7470C 
(T10074¢) wu707286 Coser  A2349G  T672C c66oT T648C T336C T2268C G937A C660T A3013T A941960 
T1416G G4095A G1080T C789T C768T A954G A2574G C8670T T968C A3148G G8420C 
2015: '4446T_  A1392G 61254A (193276) 
c7ser T1504C c T G774T c1426T C2617T C1143T G3320C C8425A 
T5160C T2214C G5019A C2748T 61320 T7956 1649C A2818G T9450C  A1324C G3348C C8427A 
T5163 A2305G c6228r C4776T A1334G Seres c1291T C2496T  A2952G T1508C G3353A G8493T 
A8332G Aa3261G T7987c 149800 T1707A RE T1899C C2607T 63033A A1559C G3894A C8497G 
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c7398T G2681¢ T2746C 
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T5439C TOTAS: A3853G pepe c3795T (T5388C A3427G C1868G A5208G T9393C 
T927C G6243A T8910C G3894A 069 C4161T T5919C T3624C A1920G C5328T A9543G 
C1326T c9498T T4590 erie T4341C T8100C C5830T T2215A T5568G G9570A 
ce6seT C5920T aesaat C5901T G8451A G6876A A2277G A5599C G9687C 
A8106G T5049C 06492A T6618¢ G7089A  A8492T C7344A T2379A A5813G T9690C 
C8220T T6522A T7104C Seest A10057G 68733A C7526T C2381G A6762C 1T10001C 
FR peat pape G6876A C10161T = c7698T A2384G G6937C T10056C 
C8246T A10271G = T9198C C2454T aA6949T 
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c9702T 
180016 C2909G C7366A 
c9133T 
T3486C G9340A 
c1206T 
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G1368A T6564C AT1226 
C1797T C9135T AT3356 
134800 93276 
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Figure 1 Phylogenetic tree of the 17 Zika RNA sequences from samples collected in the present epidemic 

Two missing mutations - indicated by the brackets: C10074T in sequence KU365780-Brazil-2015 and G9327A in sequence KU321639-Brazil-2015, 
were probably caused by technical errors. The position of a hypothetical Brazilian Reference Sequence (BRS) is marked on the tree. The BRS is used 
in the two utilities, the Amino Acid Analyser and Nucleotide Base Analyser prepared for this paper and available at: www.ianlogan.co.uk/ zika- 


pages/zika.htm. 

caused in very different ways, with fetal abnormalities possibly 
being the result of direct fetal infection, and Guillain-Barré syn- 
drome cases possibly due to an exaggerated auto-immune 
response (Cao-Larmeau et al., 2016; Willison et al., 2016). 

In the author’s opinion, however, both conditions may result 
from the same underlying cause, in which the virus is able to 
cross the normally impenetrable placental and blood-brain 
barriers. How this happens is unclear, but it might just be a 
matter of a person getting a very high initial infection, possibly 
by having been bitten by a physically large carrier mosquito, or 
being bitten by several carrier mosquitoes in a very short period 
of time. 

A study using the West Nile virus (Styer et al., 2007) showed 
that whilst most of the inoculum from a mosquito bite remains 
localized in the skin, there is always a significant initial viraemia. 
In this respect, a recent report from Slovenia (Mlakar et al., 
2016) showed the X-rays of an affected fetus having numerous 
calcifications in the placenta and brain. Whilst it is unproven, it 
would seem possible that these lesions result from localized 
‘viral plaque formation’ associated with an initial viraemic 
spread. A similar clinical picture is seen in tuberculosis. Al- 
though this disease is caused by a bacterium and not a virus, 
the resulting X-ray picture of localized calcifications is well- 
recognized and is termed miliary tuberculosis (Khan et al., 2011; 
Yang et al., 2015). 
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It is also possible that the risk of developing complications 
from the Zika virus may reflect the genetic differences between 
sufferers and the general population. At the present stage of our 
knowledge, however, there is no indication of what particular 
differences might be important. 


CONCLUSIONS 

This study shows in a simple way how sequencing data from 
samples of the Zika virus available in the public domain can be 
collected and analyzed. Using this data, it is possible to con- 
struct a phylogenetic tree and show that in the present epidemic 
there are already many identifiable virus strains. The data also 
show that the Zika virus has a high mutation rate. 

This short paper raises as many questions as it tries to an- 
swer. The present epidemic is from the Zika virus, but Yellow 
Fever cases are rising in Africa and Dengue affects millions of 
people each year. Thus, further pandemics caused by 
flaviviruses, other than Zika, pose a continuing threat. 
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